Reducing Data Stream Sliding Windows by Cyclic Tree-Like Histograms
نویسندگان
چکیده
Data reduction is a basic step in a KDD process useful for delivering to successive stages more concise and meaningful data. When mining is applied to data streams, that are continuous data flows, the issue of suitably reducing them is highly interesting, in order to arrange effective approaches requiring multiple scans on data, that, in such a way, may be performed over one or more reduced sliding windows. A class of queries, whose importance in the context of KDD is widely accepted, corresponds to sum range queries. In this paper we propose a histogram-based technique for reducing sliding windows supporting approximate arbitrary (i.e., non biased) sum range queries. The histogram, based on a hierarchical structure (opposed to the flat structure of traditional ones), results suitable for directly supporting hierarchical queries, and, thus, drill-down and roll-up operations. In addition, the structure well supports sliding window shifting and quick query answering (both these operations are logarithmic in the sliding window size). Experimental analysis shows the superiority of our method in terms of accuracy w.r.t. the state-of-the-art approaches in the context of histogram-based sliding window reduction techniques.
منابع مشابه
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
The issue of providing fast approximate answers to range queries on sliding windows with a small consumption of storage space is one of the main challenges in the context of data streams. On the one hand, the importance of this class of queries is widely accepted. They are indeed useful to compute aggregate information over the data stream, allowing us to extract from it more abstract knowledge...
متن کاملFast and Space-Efficient Computation of Equi-Depth Histograms for Data Streams
Equi-depth histograms represent a fundamental synopsis widely used in both database and data stream applications, as they provide the cornerstone of many techniques such as query optimization, approximate query answering, distribution fitting, and parallel database partitioning. Equi-depth histograms try to partition a sequence of data in a way that every part has the same number of data items....
متن کاملResearch on Sliding Window Join Semantics and Join Algorithm in Heterogeneous Data Streams
Sliding windows of data stream have rich semantics, which results all kinds of window semantics of different data stream, so join semantics between the different types of windows becomes very complicated. The basic join semantic of data streams, the join semantic of tuple-based sliding window and the join semantic of time-based sliding window have partly solved the semantics of stream joins, bu...
متن کاملA Single-scan Algorithm for Mining Sequential Patterns from Data Streams
Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more ...
متن کاملProcessing Continuous Historical Queries over XML Update Streams
We address the problem of processing continuous historical queries over streams of XML data, returning continuous, exact answers. The stream data considered are tokenized XML data with embedded updates for inserting, removing, or replacing stream subsequences that correspond to complete XML tree nodes when they are fully materialized to trees. Our query language for expressing historical querie...
متن کامل